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1 Introduction 

Our goal in this paper is to prove concentration inequalities for Lipschitz functions of certain col- 
lections of negatively dependent binary-valued random variables. To illustrate our general methods 
we state our main result in a special case that was motivated by a question of E. Mossel (personal 
communication) . 

Theorem 1.1. Let G = {V,E) be a finite connected graph, let P be the uniform measure on the 
spanning trees ofG, and for e G E let X^. be the indicator function of the event the e is in the chosen 
spanning tree. Let / : {0, 1}^ ^ R &e any function with Lipschitz constant L Then 



For example we might take / to be one half the number of vertices whose degree in the random 
tree is odd. This result is a consequence of more general results stated in Section [3] and Section [5l 

1.1 Classical concentration inequalities 

Let {Xn '■ n > 1} be independent Bernoulli random variables with respective means {pn}- Let 
Sn ■= ^fe denote the partial sums, /i„ :— ESn — J2^=iPk denote the means and Vn '■= 

X]fc=iPfe(l ~ Pk) denote the variance of Sn- The simple and well known one-sided tail estimate for 
Sn is the classical Gaussian bound 



The bound may be found, among other places, in |McD89| Corollary 5.2]. The references given 
there include |Hoe631 (2.3)] as well as the celebrated paper [Che52j . which proves the result for 
identically distributed variables. 

When Pn and I — Pn are bounded away from zero, the variance of Sn is of order n and this 
kind of bound is the best one can expect. However, when n » fin, one might hope for uniformity 
in n via bounds in which the exponent depends /i„ and not on n. For example, if n — > oo with 
maxj<„pj — >■ and fin ^ A*, the sum 5„ converges to a Poisson. The upper tail of a Poisson is not 
as thin as a Gaussian, being exp[— 9(a log(a//i))] rather than exp[— 6(a^//i)]. The bound 





(1.1) 



Replacing Xn with 1 — X„ gives the two-sided bound 




(1.2) 




(1.3) 



is proved in [Hoe63l Theorem 1] and achieves the Poissonian upper tail. 
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1.2 Generalizations 



Our aim is to generalize or its Poissonian version (|1.3p in two ways. Instead of Sn we consider 
arbitrary Lipshitz functions of Xi , . . . , X„ , and instead of independent BernouUis we consider a 
more general negatively dependent collection of binary random variables. We will give a number of 
applications, but before this, we briefly discuss what is known about each of the two generalizations 
separately. 

For the first generalization, let Bn denote the rank-n Boolean lattice {0, 1}" and let / : ;B„ — > M 
be Lipschitz with respect to the Hamming distance. Replacing / by //c if necessary, we will lose no 
generality in assuming our Lipschitz functions to have Lipschitz constant 1, and we do so hereafter; 
thus \f{x) — f{x')\ < 1 whenever x and x' are two strings differing in only one position. 

Denoting f{Xi, . . . ,X„) by /„, most proofs of actually prove the generalization 

nfn - Efn >a)< e-2«V" . (1.4) 

Replacing / by -/ generalizes (HH]) to the two-sided bound P(|/„ - E/„| > a) < 2e-2aV«. 

For the second generalization, we say that a collection of random variables {X^} in {0,1} is 
negative cylinder dependent if 

P(Xj = 1 for ah J G 5') < pj (1.5) 

and 

V{Xj = for aU j e S) < YI{1 - pj) . (1.6) 

jes 

Negative cylinder dependence implies the inequalities (|l.ip - (|1.2p : this may be found for instance 
in |PS97[ Theorem 3.4] by setting A = 1. 

It is not known whether these two generalizations can be combined. The random variables {-'^n} 
are said to be negatively associated if E/g < {Ef){E,g) for every pair f,g of increasing functions 
on {0, 1}" such that f{Xi, . . . ,X„) depends only on the values {Xi : i £ S} and g{Xi, . . . ,X„) 
depends only on the values {Xi : i ^ S}, for some subset S C {1, . . . , n}. By induction, this implies 
the weaker property of negative cylinder dependence. In correspondence with E. Mossel (personal 
communication) the following conjecture arose. 

Conjecture 1.2. Let Xi, . . . , Xn be negatively associated binary-valued random variables. Let f : 
{0,1}" — ^ M &e Lipschitz-1 and denote fn ■— f{Xi,...,X„). Then (ll.4|l holds with the bound 
exp(— 2a^/n) replaced by cq exp(— ca^/n) for some positive constants cq and c. 

To see why the exponent must be weakened consider the example of Bernoulli random variables 
Xi, . . . , Xn with n even, {Xi, . . . , Xn/2} independent with mean 1/2, and Xn/2 + j — ^ ~ 
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for I < j < n/2. These are negatively associated and yet the Lipshitz-1 function / :— X]j=i ~ 
Sj=n/2+i -^j ^^il probabilities on the order of e"" It is possible that this is the worst example 
and that the result is true with c = 1, but a resolution of the conjecture would be interesting even 
without the optimal value of c. 

Recent investigation into negative association and other negative dependence properties indicate 
that negative association may not be sufficiently robust to use as a hypothesis in this kind of context. 
The problem was posed in |PemOOj to find a more useful and natural negative dependence property; 
this was answered in |BBL09) . who showed that the strong Rayleigh property implies negative 
association and many other desirable consequences and is stable under probabilistic operations such 
as conditioning, symmetrizing and reweighting. 

Our main result implies that Conjecture 11.21 holds with c = 1/8 if one assumes the strong 
Rayleigh property rather than just negative association. The strong Rayleigh property is known 
to hold for most standard examples in which negative association is known to hold, so this gives 
up little generality, and moreover the strong Rayleigh property is usually easier to check than is 
negative association. Indeed for some of the measures described below, the only way we know they 
are negatively associated is by establishing the strong Rayleigh property. Several classes of measure 
satisfying the strong Rayleigh property are: 

• Determinantal measures and point processes; 

• BernouUis conditioned on the sum; 

• Measures obtained by running exclusion dynamics from a deterministic starting state (or more 
generally, exclusion with birth and death). 

An overview of the rest of the paper is as follows. In the next section we introduce the strong 
Rayleigh property and discuss its consequences. One important consequence for us will be the 
stochastic covering property, which is all we use to derive our basic concentration inequality. 
In section |3] we state our results, and these are proved in Section |4l Section [5] contains a number of 
applications. 

2 Strong Rayleigh property, stochastic covering property, 
and other negative dependence conditions 

Let [n] denote {1, . . . ,n} and let S„ :— {0, 1}" denote the Boolean lattice of rank n, with coordi- 
natewise partial order. The function N : Bn ^ will be used throughout to denote the counting 
function defined by N{ijj) :— '^j- ^ measure v on S„ is said to be fc-homogeneous if v is 
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supported on the set of {u : N{uj) — k}. The probabihty measure ly on i3„ is said to be negatively 
associated if / fgdv< (/ f dv){j g dv) for every pair of nonnegative monotone functions / and g 
such that for some set 5* C [n], the function / depends only on coordinates {tOj : j S S*} while the 
function g depends only on coordinates {ijjj : j e S*"^}. 

The strong Rayleigh condition is said to hold for a measure P on B„ if the generating function 



uiGB„ j = l 



has no roots (zi,...,z„) all of whose coordinates lie in the (strict) upper half plane. This and 
many consequences are given in |BBL09| . including (implicitly) the stochastic covering property 
(see Proposition [521), which was conjectured |Pem001 Conjecture 9] to follow from something a little 
weaker. Some of the relevant implications are summarized in Figure [T] below. 

The definition of the stochastic covering property requires a few preliminary definitions. Recall 
that a measure on a partially ordered set is said to stochastically dominate a measure p, denoted 
1/ ^ p, if vi^A) > p{A) for every upwardly closed set A. An equivalent condition is that there exists 
a coupling, that is a measure Q on Bn x Bn with respective marginals v and p, supported on the 
set {{x,y) : x > y}. If P is a measure on Bn making the coordinate variables {Xi : 1 < i < n} 
negatively associated, an immediate consequence of negative association is that the conditional 
measure (P | Xn = 0) on Bn-i stochastically dominates the conditional measure (P | Xn = 1). 

Stochastic 
Covering 
Property 

Strong projected / Negative 
Rayleigh homogeneous ^ Negative ^ Cyhnder 



Rayleigh Association 



Dependence 



Figure 1: relations among negative dependence properties 

We say that the probability measure v on Bn stochastically covers another probability measure 
p if there is a measure on Bn with first marginal v and second marginal p (in other words, a coupling) 
supported on the set of pairs (x, y) for which x = y ov x covers y in the coordinatewise partial order; 
here x is said to cover y when x > y but there is no z such that x > z > y. We denote the covering 
relation in Bn hy x ■> y, and one measure covering another hy > p. Stochastic covering is strictly 
stronger than stochastic domination, and may be thought of as "stochastic domination, but by at 
most 1". 

Suppose that x > y and wc compare the conditional laws '■= (P I — Xj,j G S) and 
P.y := (P I Xj = yj,j G S) on the remaining coordinates, that is as laws on {0, 1}'^^. If P and all its 
conditionalizations are negatively associated, it follows that P^; ^ P^. 
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Definition 2.1 (stochastic covering property). We say that a probability measure v on S„ has the 

stochastic covering property if for every S C {1, . . . , n} and for every x, y £ {0, 1}"^ with x •> y, the 
conditional law [v \ Xj — Xj,j G S) is covered by the conditional law {i> \ Xj = yj,j G S). 

It is shown (see [BBLOQl Theorem 4.2] that the strong Rayleigh property imphes the projected 
homogeneous Rayleigh property (PHR), meaning that the measure can be embedded as the 
first n coordinates of a measure i>' on Bm (™ > n) that has the ordinary Rayleigh property; the 
ordinary Rayleigh property is that the partial derivatives of the generating function F(zi, . . . , Zn) ■= 
^YVj^i ^f^ satisfy FiFj > Fij at any point with positive real coordinates. We record two further 
consequences. 

Proposition 2.2. PHR (and hence strong Rayleigh) implies the stochastic covering property. 

Proof: PHR implies negative association of all conditionalizations (CNA) jBBLOQl Theorem 4.10]; 
the homogeneous extension i/' witnessing the PHR property is also PHR hence also CNA. By negative 
association, if x •> y then (ly' \ Xj = yj,j £ S) >z {v' \ Xj ~ Xj,j € S), when viewed as measures on 
the coordinates in [m] \ S. Because v' is homogeneous, the coupling that witnesses this >r relation in 
fact witness the relation >. Restricting to [n] \S we see that {I'lXj — yj ,j 6 S*) > (z/ | Xj — yj ,j e S). 
□ 

Proposition 2.3 f IBBLOQl Theorem 4.19]). Let Pk denote P conditioned on N — k. Then for every 
< fc < n — 1, with P(7V = k) and P(A^ = fc + 1) both nonzero, we have the covering relation 



3 Results 

The chief consequence of the strong Rayleigh property that we use to prove concentration inequalities 
is the stochastic covering property. Although all of our examples so far of measures with the SCP 
are in fact strong Rayleigh, we note that this may not be the case in the future, and with this in 
mind, we state a result that uses only the SCP. The first bound, depending only on k, is the most 
useful, but the second one is better when k > n/lQ. 

Theorem 3.1 (homogeneity and SCP implies Gaussian concentration). Let P be a k-homogeneous 
probability measure on Bn satisfying the SCP. Let f be a Lipschitz-1 function on Bn- Then the 
following two inequalities hold. 



Pfc+i>Pfc. 



□ 
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Replacing / with — / gives two-sided bounds as an immediate corollary. 



Corollary 3.2. 



P(|/-E/|>a) < 2exp[-aV(8fc)] 
P(|/-E/|>a) < 2exp[-2aVn] 



For strong Rayleigh measures that are not necessarily homogeneous, we have the following result. 

Theorem 3.3 (Gauss- Poisson bounds for general strong Rayleigh measures). Let P he strong 
Rayleigh with mean /i — EA^. / : Z?„ — > R &e Lipschitz- 1 . Then 



Remark. Without loss of generality a + ^ < n (or else the probability is zero) leading to simpler 
upper bounds that are respectively 3 and 5 times exp j . 



Continuous versions 

Continuous versions of these results may be stated in terms of point processes, which we now briefly 
review. Formally, a point process on a space S* is a random counting measure on S. In other 
words, a point process is a map Z defined on a probability space (J7, P) taking values in the space 
of counting measures on 5*, a counting measure being one that takes only integer values or -fcxi. 
Intuitively, one envisions the sample counting measure Z(uj) as a set of points such that the sum of 
delta functions at these points is the sample counting measure. 

If the number k of points in the support of Z is deterministic, we may dispense with much of the 
formalism by ordering the points in the support of Z uniformly at random and identifying the process 
Z with the resulting exchangeable probability law on sequences of length k in S. Notationally, if Z is 
a ^-homogeneous point process on with law P, we denote by P-f- the corresponding exchangeable 
law on (M'')'^. For 1 < j < A:, we use Xj to denote the "j*'' random point" , that is, the j^^ coordinate 
function on (M'*)'^. The following sampling algorithm for any fc-homogeneous point process is almost 
trivial once one identifies Z with P-|-, and yet it is a generalization of an algorithm previously proved 
only in the case of determinantal point process in [HKPV091 Proposition 4.4.3]. 

Lemma 3.4 (sampling in k steps). Let Z be a k-homogeneous point process on a standard Borel 
space S and let P-|- be the corresponding exchangeable measure on S'' . Then for Q < j < k there are 
regular conditional distributions Qxi,...,Xj for the law of Xj+i given Xi — xi, . . . ,Xj — xj such that 
the following procedure samples from P-f . 
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Sample Xi from Qti). 

Recursively, conditional on Xi — xi^ . . . , Xj — Xj, sample Xj^i from Qxi,...,xj ■ 

In the case where S is finite, let R denote the random set {Xi, . . . ,X„}. Then the law Qxi,...,xj is 
equal to l/{k — j) times the conditional intensity measure of R \ {xi, . . . , Xj} given xi, . . . ,Xj G i?. 

Proof: Any standard Borel space admits regular conditional distributions [Dur041 Theorem 4.1.6]. 
The sampling algorithm essentially restates the definition of regular conditional probabilities for 
sequential sampling. Because P-j- is exchangeable, conditioning on Xi-^ = x\,. . . ,Xi. — Xj gives the 
same exchangeable measure on the sequence of remaining elements of R for any ii, . . . ,ij. Thus for 
any x other than xi, . . . ,Xj, we have 

¥{x (E R\xi e R,...,Xj e R) ^ {k~ j)P{Xj+i = x\Xi = xi, . . . ,Xj = Xj) 

which is the final conclusion. □ 

Remark. In the case of a measures on a finite set of size n, the main point of this sampling scheme 
is to sample in k steps rather than n steps, so as better to control the Azuma martingale. But also, 
sequential conditioning on x € R can be easy to compute. For example, conditioning on an edge 
being in a spanning tree replaces the original graph by a contraction along that edge. 

Say that an exchangeable measure on S'^ stochastically covers an exchangeable measure on S'^^^ 
if the two may be coupled so that the second is always a subset of the first. We say that the 
/c-homogeneous point process P has the stochastic covering property if 

P P 

^ Xl,....^Xj ^ Xl,...,Xj ,Xj + l 

for all choices of xi , . . . , x^+i . 

By a Lipschitz functional of a fc-homogeneous point process, we mean an exchangeable map 
/ : 5'^ — > M that is Lipschitz with respect to the total variation metric on S''/Sk, this being the 
total variation distance between the corresponding counting measures. 

Theorem 3.5. Let Z he a k-homogeneous point process on and let f he a Lipschitz- 1 function 
on counting measures wth total mass k onM.'^. If Z has the SCP, then 

- E/ > a) < exp 

For point processes that are not homogeneous, as in the discrete case, we require more than the 
SCP. Rather than defining a notion of strong Rayleigh here, we will stick to the case of determinantal 
point processes, this being where all of our examples arise; see Section [STU for definitions. The total 
variation distance, hence the Lipschitz notion, extends to the union over k of S'^/S'^. 
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Theorem 3.6. Let Z be a determinantal point process with EiV = ji < oo. Let f he a Lipschitz- 1 
function on finite counting measures. Then 

P(/ - E/ > a) < 3 exp 
P(|/-E/>a) < 5 exp 



4(a + 2/x) 
"4(a + 2/i) 



4 Proofs 



4.1 The classical proofs 



To prove bounds such as (jl.ip . one obtains an upper bound for Ee'^"^", and then apphes Markov's 
inequahty, choosing A optimally. Underlying the bounds on Ee^'^" are corresponding bounds for 
compensated increments. Let A denote a variable with mean zero. Three classical exponential 
bounds are as follows. 



|A| < 1 => Ee^^ < 

A e [a, b] ^ Ee^"^ < exp [(e^ - 1 - A) \ab\] , 
These are used together with the following two special cases of Markov's inequality. 



Ee 



XX 



V{X >a)< e--^'/^^^) 



<- gfc(e^-A-l) 



P(X > a) < e" 



a + b 



< exp 



2(a + ^) 



(4.7) 
(4.8) 
(4.9) 

(4.10) 
(4.11) 



These inequalities have appeared many times in the literature. Inequalities (|4.7p and (I4.10p 
constitute the classical Azuma-Hoeffding inequality and imply 

EgA(S„-M,.) < ^4-^2) 

P(5„~Mn>a) < e-'''/^^"). (4.13) 

This is valid for any martingale with differences bounded by 1; an exposition can be found in [AS081 
Theorem 7.2.1]. The improvement to (|4.8I) is present already in |Hoe63| . though the exposition 
in [McD89j is clearer (see Lemma 5.8 therein). When the increments of Sn — ^J■n are compensated 
BernouUis, one may take b — a ~ I rather than 2, resulting in an improvement by a factor of 4 in 
the exponent, 
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which together with (|4.10p yields Finally, (|4.9p and induction yield 

where Vn := X]fc=iPfc(l ~ -Pfc) is the variance of Sn', together with (|4.1ip this implies (|1.3p : these 
results appear, for instance, in |Fre75[ (1.3)-(1.6)]. 

To prove the generalization to Lipschitz functions, let 

Mk :=E(/(Xi,...,X„)|Xi,...,Xfe) . 

It is immediate that {M^} is a martingale and that conditional on Xi, . . . ,Xk-i, the two possible 
values of differ by at most 1. Hence, conditional on Xi, . . . , Xk~i, the increment :— X^—Xk-i 
is constrained to an interval of length at most 1. Applying (|4.8p then yields (|1.4p . 

The extension of inequalities (|l.ip - (|1.2p to negatively cylinder dependent random variables is 
established by examining the power series for e^"^". This may be expanded into positive sums of 
expectations of products of powers of the variables {Xj : 1 < j < n}. Negative cylinder dependence 
implies that these are bounded from above by the corresponding products of expectations. There- 
fore, (|4.14p and (|4.15p hold when the assumption of independence is replaced by negative cylinder 
dependence, whence the probability inequalities (|l.ip and (|1.3I) hold as well. This and more is shown 
in |PS97[ Theorem 3.4], specializing their more general negative cylinder property to A = 1. We 
remark that only the first inequality (jl.Sp in the definition of negative cylinder dependence is used 
to obtain bounds on Ee^'^" for A > 0, which suffices for the upper tail bounds. Lower tail bounds 
require these inequalities for A < 0, for which the second inequality (jl.6l) is required. 

4.2 Proof of Theorems [HH] and [g^l 

Let (il, F, P) be a probability space on which is constructed the generalized sampling scheme de- 
scribed in Lemma 13.41 Let Tj := (7(Xi, . . . , Xj)} and let P*^^-* denote the random exchangeable 
measure Pxi,...,Xj on 5'^"^. Let 

M, :=E(/|J-,)-E/. (4.16) 
denote the martingale of sequential revelation. The increments of this martingale are 

A, := E(/|J-,+i)=E(/|J-,) 

The measures P^-^+i) -|- (5xj_n and P'-'^ can be coupled so that the samples always have distance at 
most 2; intuitively, to get from one to the other you add a point at Xj+i and then drop a point. 
By the Lipschitz assumption on /, it follows that |Afj+i — Mj\ < 2. We now apply the basic 
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Azuma-HocfFding inequality (I4.13P to {Afj/2}i<j<fc yielding 

P(;-E/><.).p(^>0<exp(-|!). 

□ 



4.3 Proof of Theorems m and ISTGl 

In this section we assume P is the law of either a strong Rayleigh measure on S„ or a determinantal 
point process Z with finite mean EA^ = /i. We also let / denote an arbitrary but fixed Lipschitz-1 
function on configurations and define a function (/> on Z+ by 

m :=E(/|iV-fc). 

Lemma 4.1. T/ie variable N is distributed as the (possibly infinite) sum of independent Bernoullis. 

Proof: In the definition of the strong Rayleigh property, setting the variables zi,...,z„ equal 
produces a univariate polynomial with no roots in the upper half plane. As pointed out at the 
beginning of Section 3 of jBBL09] . such a polynomial with nonnegative real coefficients must have 
all its roots real, which implies that it generates a convolution of Bernoullis. For determinantal point 
processes with finite mean number of points, this is |HKPV09l Theorem 4.5.3]. □ 

Lemma 4.2. The variable N satisfies 

Ee^(Af-f ) < exp [^(e^ - 1 - A)] 

and consequently for any a > 0, 

V - ^ 2/ - ^ V 2(^ + a/2) ) 

Proof: By Lemma [4.11 A" is distributed as the sum of independent Bernoullis, which implies the 
first inequality; this implies the second inequality by (|4.1ip . □ 

Lemma 4.3. The function (f> is Lipschitz-1. 

Proof: By Proposition 12.31 in the case of strong Rayleigh measures on Bn or Proposition 15.101 in 
the case of determinantal point processes, we know that Pfc+i i>Pfc. By definition of the stochastic 
covering relation, {(j){k + l),(/)(fc)) may be written as E(/(r7), /(^)) where d{ri,^) = 1 almost surely. 
The conclusion then follows from the fact that / is Lipschitz-1. □ 
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Lemma 4.4. The random variable (f>{N) satisfies the concentration inequality 

]Ee^(0W-E0(W)) < gP(e''-l-A) _ 

Consequently, the upper tails of (t>{N) obey the bound 

F{(j){N) - E0(iV) >t) < e^^^ . 

Proof: Pursuant to Lemma HTTl let {Yj} be a finite or countably infinite collection of independent 
Bernoulli variables whose sum has the same law as N . Then (j>{N) — 'E(j){N) has the same distribution 
as (t>{J2j ^j) ~ E(/)(7V) and we can write this as the final term of a martingale {M„} where M„ :— 
K{(t>{J2j '^j) I -^n) — and Tn ■= cr(Fi, . . . ,F„). Conditional on J>i, the distribution of M„+i is 
concentrated on two values: 

(Af„+i I Tn) = p5a + (1 - p)Sb 

where p is the mean of the Bernoulli variable and o (respectively b) is the conditional means 

given the values of Yi, . . . ,Yn conditioned on and = 1 (respectively 0). The obvious coupling 

of the conditional distributions of Yj given Yi,. . . ,Yn and the two possible values of has 
the first always greater by precisely 1. Because (j) is Lipschitz-1, we conclude that |a — 6| < 1. We 
then obtain 

E (^e^(J^'.+i-*^") I < cxp {p{l-p){e^ - 1 - A)) . 
The lemma follows by induction. □ 

Proof of Theorems: The event {/ — E/ > a} is contained in the union of three events: 

{tV > /i + ^} U {cj^iN) _ E/ > ^} U {/ - m) >l,N<H+^} . 

Thus P(/ — E/ > a) is bounded above by the sum of the corresponding probabilities. Each of these 
is bounded above by exp[— a^/(4(a + 2/i))] for a different reason. The first is Lemma [4.21 and the 
second is Lemma [4.41 noting that E(/) — E E(/ | A'') = E/. For the last inequality, observe that the 
measures Pfc are all strong Rayleigh (this is |BBL09[ Corollary 4.18]). For any fc < /i + a/2, we can 
apply Theorem 13. II to the homogeneous measure P^, obtaining 

P (/ - ,m > 1 1 A- . < »p (-Ifl!) < „p (- ^ 

Reassembling these gives the upper bound 

P(/-*(A')>|,iV<. + f)<exp(-j^)^ 

Summing the three bounds gives the first inequality of the theorem. 

For the two-sided bound, we need to consider two more events in addition to the three already 
considered, namely the events that — E0 < —a/2 or f — (j){N) < —a/2; note that we do not need to 
consider the event that N < ijL + a/2. The arguments for these two extra events are exactly analogous 
to two of the three arguments we have already seen, leading to a bound of exp[— a^/(4(a + 2fi))] for 
each of the two new summands and establishing the two-sided bounds. □ 
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5 Applications 



In this section we discuss some classes of measures known to satisfy the hypotheses of our concen- 
tration results. The following Venn diagram gives a sense of how these classes intersect each other. 




Figure 2: some classes of strong Rayleigh measures 



5.1 Matroids 

A collection C of subsets of a finite set E, all of a given cardinality, k, is said to be the set of bases of 
a matroid if it satisfies the base exchange axiom (see, e.g., jTut71j ): if A and B are distinct members 
of C and a <E A \ B, then there exists b ^ B \ A such that A U {b} \ {a} G C. Given a matroid, it 
is natural to consider the uniform measure on C. More generally, the vifeighted random base is 
chosen from the probability measure 

i^n,{B) :-C[|«;(e), 

eeB 

where {w{e) : e S E} is a collection of nonnegative real numbers (weights) and C is a normalizing 
constant. Identifying E with the set {1, . . . , \E\}, the measure and the random variables Xe := 
lees can be thought of as living on S„. 

For general matroids, EXeXf may be greater than (EXe)(EX/). Some speculation has been 
given to the most natural class of matroids for which negative correlation or negative association 
must hold. Feder and Mihail |FM92) define a balanced matroid to be a matroid all of whose minors 
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satisfy pairwise negative correlation. Their proof of the fohowing fact was the basis for the original 
proof of negative association for determinantal processes |Lyo03[ Theorem 6.5]. 

Proposition 5.1 f [FM921 Theorem 3.2]). The law Vyj of a random base of a balanced matroid, 
multiplicatively weighted by the weighting function w, has the SCP. □ 

Because measures supported on the bases of a matroid are homogeneous, there is nothing gained 
by improving the SCP to the strong Rayleigh property, and we have the following immediate corol- 
lary 

Corollary 5.2. Let f be a Lipschitz- 1 function with respect to Hamming distance on the bases of a 
balanced matroid of rank k on n elements. Then 

P(/ - E/ > a) < exp ( . f ) . 

\ mm|8fc, n/2} J 

□ 

Example 5.3 (spanning trees). One of the most important examples of a matroid is the set of 
spanning trees of a finite, connected, undirected graph. To spell this out, a spanning tree for a finite 
graph G — {V, E) is a subset E' <Z E such that {V, E') is a connected and acyclic. The set of spanning 
trees is a matroid on E. The weighted random spanning tree was shown to be a balanced matroid 
by Theorem Ij. In fact they showed it is determinantal (see also lLyo03[ Example 1.1] 

and \HKP VOS\ Example 4-3.2]), though at the time consequences of being determinantal, such as 
the Strong Rayleigh property, had not been developed. Spanning trees are the only well known class 
of matroid whose uniform (or weighted) measure is determinantal. 

Let fo : {0,1}^ — > Z count the number of vertices of odd degree in the graph defined by any 
subset of the edges. Deleting or adding an edge changes fo by at most 2. Let f be the random 
variable resulting from applying (l/2)/o to a the weighted random spanning tree on a graph G. 
Thus f is a Lipschitz 1 function that counts half the number of vertices that have odd degree in 
the random tree. Random variables that count local properties such as this are of natural graph 
theoretic interest. Parity counting variables similar to f play a role, for example in the randomized 
TSP approximation algorithm of jGSSWj. The number of edges in any spanning tree is \V\ — 1. 
An application of Corollarv \5.2\ immediately gives the concentration inequality in Theorem \l.l\ note 
that \V\, rather than \E\, appears in the denominator of the exponent. 

Example 5.4 (independent BernouUis conditioned on the sum). The bases of the matroid M{n, k) 
are those subsets of [n] having cardinality exactly k. The law Vw of the weighted random base on this 
matroid has the strong Rayleigh property ( this follows from the fact that the strong Rayleigh property 
is closed under multiplicative weights and the observation that it is true for the uniform random 
base). Restricting to [m], for m < n, gives a joint distribution Pn,k,m on Bm which is a generalized 
negative binomial distribution in the sense that the variables are distributed as independent BernouUis 
conditioned on summing to k. These measures have the stochastic covering property. We remark 
that conditioned Bernoulli measures are not in general strong Rayleigh. 
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5.2 Exclusion measures 



The symmetric group iS„ acts on Bn by permuting the coordinates. Suppose a nonnegative rate 
r(T) is given for each transposition r g 5„. Define a random evolution on Bn by letting each pair 
of coordinates transpose independently at rate r{Tij). In other words, we have a continuous 
time chain on Bn which jumps from x to t(x) at rate r(r) for each transposition r. This process is 
known as the symmetric exclusion process. 

Borcea, Branden and Liggett jBBLOQl Proposition 5.1] prove that the strong Rayleigh property 
is preserved under this evolution. In particular, because the point mass at a single state is always 
strong Rayleigh, it follows that the time t distribution of a symmetric exclusion process started from 
a deterministic state is strong Rayleigh. The stochastic covering property follows, as do PHR and 
negative association. Interestingly, before the publication of |BBL09] , all that was known about this 
model was negative cylinder dependence |Lig77[ Lemma 2.3.4]). 

Recently, it was shown by |Wagll| that one can add birth and death to the exclusion dynamics 
and still preserve the strong Rayleigh property. More specifically, let {ai,/3i : 1 < i < n} be 
positive real numbers and let uji change to one at rate and to zero at rate /3i, along with the 
exclusion dynamics. Then the evolution preserves the strong Rayleigh property and in particular, if 
the starting state is deterministic, all time t marginals are strong Rayleigh. 

Corollary 5.5. Let P be the law on Bn resulting from running an exclusion process for a fixed time, 
starting from a deterministic state with k sites occupied. Then 

P(/ -Ef>a)< e~'^'/(*^'=) 

and the same bound holds with 8k replaced by 2n. 

Example 5.6. Let n > be even and populate a n x n square of the integer lattice in 1? (with 
torus boundary conditions) by filling all sites in the left half and leaving empty all sites in the right 
half. Run the .symmetric exclusion process for time t with rate 1 on each edge. Let ft{i-^) denote the 
number of edges at time t with exactly one endpoint occupied. The mean of ft varies from n at time 
to a limiting value of . Once t — Q{n^), the mean of ft becomes Q{n^) and the concentration 
inequality 

P(/-IE/ > a) < 6"'/^""') 
gives Gaussian tail bounds (here k = n'^/2). 
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5.3 Determinantal measures on a finite Boolean lattice 



We say that a probability measure P on i3„ is determinantal (in the general sense) if there is an 
n X n real or complex matrix K such that that for every 5* C {1, . . . , n}, 



where Kg is the submatrix of K obtained by choosing only those rows and columns whose index is in 
S. In this definition, the phrase "general sense" refers to the lack of further assumptions on K. An 
important subclass is the Hermitian determinantal measures, for which the matrix K is Hcrmitian. 
In this paper we will be interested only in the Hermitian case and will use the term determinantal 
hereafter to refer only to the case where K is Hermitian. Determinantal measures are known to 
be negatively associated |Lyo03[ Theorem 6.5]. In fact they are strong Rayleigh ^BBL09i proof of 
Theorem 3.4] and therefore satisfy the stochastic covering property. 

Example 5.7 (uniform or weighted spanning tree). As previous remarked, the uniform or weighted 
random spanning tree is a determinantal measure. 

In the next section we will extend the notion of a determinantal measure to the continuous 
setting. The extension to a countably infinite set of variables is more straightforward: the kernel K 
is now indexed by a countably infinite set, but (|5.ip may be interpreted as holding for all finite sets 
S. The following example of a determinantal process on Z appeared first in | JohOS] . 

Example 5.8 (positions of non-colliding RW's). Let {yC^) : 1 < k < n} he n independent time 
homogeneous nearest neighbor random walks on Z. Begin the walks at locations and 
suppose the event that the walks are all at their starting positions at time 2n and have not intersected 
has positive probability. Conditional on this event, the positions at time n form a determinantal 
measure. That is, the indicator functions {Xj} have a determinantal law, where Xj = 1 if some 
is at position j at time n, and zero otherwise. 

Remark. The positions of non-colliding random walks are given by a determinant under more general 
conditions (see [KM59| ). The present situation is arranged so as to make the kernel Hermitian. 

5.4 Determinantal point processes 

We consider here only simple point processes and often assume EiV < oo as well. If pk : iW^)^ — ^ 
are measurable functions, then the simple point process Z is said to have joint intensities {pk} if for 
any k and any family _Di, . . . , Dk of disjoint Borel subsets of M"^, 



E ]J det Ks 



(5.1) 




k 
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In particular, 

¥.N = pi{x)dx 

so under the assumption ¥jN < oo, we see that pi{x) dx is a finite measure on . If pi is not finite, 
we will assume it is (T-finite. In any case, pi is called the first intensity measure; see |HKPV09l 
Sections 1.2 and 4.2] for further discussion of joint intensities and determinantal measures. 

Definition 5.9 (determinantal point process). A point process Z is said to be determinantal if it 
has joint intensities {pk} o,nd there is a measurable kernel K : (W^Y' — !■ C such that 

Pk{xi, . . . ,Xk) ^ Aet{K{x,,Xj))^^^^^^^ . (5.2) 

If K{y,x) = K{x,y) for every x,y, then the process is said to be Hermitian. 

Stochastic covering carries over to the continuous case. To state the relevant results we invoke 
the notion of the palm process. This is a version of the process conditioned on the (measure zero) 
event of a point at a specified location, x. It may be obtained by conditioning on there being a 
point within distance e of a given location cc, then taking a weak limit. A more complete treatment 
may be found in |Kal86) . The fist three parts of the conclusion of the following result are proved 
in [GollO) . 

Proposition 5.10 ( |GollO| ). Suppose Z is a determinantal point process with continuous kernel K 
and finite trace. Fix x and let Z^ denote the Palm process that conditions on a point at x. Let Z'^ 
denote the result of removing the point at x from Z^ ■ Then 

(i) Whenever K — L is positive semi- definite, the process with kernel K stochastically dominates 
the process with kernel L (this is \GollO[ Theorem 3]). 

(ii) Zlj. is determinantal with kernel L such that K — L is positive semi- definite. 

(Hi) Consequently, Z y Z'^. 

(iv) In fact, Z\> Z'^. 



Proof: We need only prove the last statement. If Z is fc-homogeneous then is (fc — 1)- 
homogeneous and stochastic covering follows from stochastic domination. If ^ C 5* then the re- 
striction Z\j!^ of the A;- homogeneous process Z an S to A will satisfy (w), because conditioning on 
x commutes with restriction to A. The general result now follows because all processes arise in this 
manner. At the operator theoretic level, this is a classical result. It is quoted for instance at the 
beginning of |Lyo03[ Section 8]. The context there assumes finite dimensionality but in fact it is 
true in general as is shown in e.g., |Pau02| Theorem 1.1]. □ 
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The preceding result provides a continuous analogue to Proposition [521 The analogue to Propo- 
sition 12.31 is 

Proposition 5.11. Let Z he a determinantal point process with finite mean "EiN = fi < oo. Then 
for any k for which P(N = fc + 1) and ¥{N = k) are both nonzero, the conditional distributions of 
Z given N satisfy 



The following facts may be found in |HKPV06l Theorem 7]. A determinantal point process Z 
with mean /i < cx) has a kernel K whose spectrum is countable, contained in [0,1], and sums to 
fi. Furthermore, Z may be represented as a mixture of homogeneous determinantal processes as 
follows. Let {Xi '■ i > i} enumerate the eigenvalues with multiplicities and let {(f>i} be a corresponding 
eigenbasis. For each i, flip an independent coin with success probability A^. Let / denote the set 
of i for which the coin-flip was successful. Let Kj be the (random) projection operator onto the 
subspace spanned by the eigenvectors (pi for which the coin-flip was successful. Then Kj is almost 
surely a projection of finite dimension |/| and is the kernel of a |/|-homogeneous determinantal point 
process. Choosing Kj at random and then sampling from the corresponding process recovers the 
law of Z. 

Several consequences are apparent. First, conditioning on = fc is the same as conditioning on 
exactly k successes among the Bernoulli trials. Secondly, because independent BernouUis are strong 
Rayleigh, the conditional law of / given |/| = fc -I- 1 stochastically dominates the conditional law 
of I given |/| = fc. This is equivalent to saying that the conditional law of the random subspace 
Kj given |/| = fc + 1 stochastically dominates the conditional law of the random subspace Kj 
given |/| — fc, in the sense that the two laws can be coupled as {K, K') so that K' C K. When 
K' C K, the operator ttk — ttk' is positive semi-definite. By (ii) of Proposition 15.101 we conclude 
that {Z \ N = k + 1) y {Z \ N ^ k) which is equivalent to stochastic covering in this case. □ 

Example 5.12 (Ginibre's translation invariant process). Ginibre JGin65}j considers the distribution 
of eigenvalues of an k x k matrix with independent complex Gaussian entries. In the limit as 
k oo, the density becomes constant over the whole plane. The limiting process Z turns out to be 
a (Hermitian) determinantal point process with kernel 



see, e.g. \SosO(A (2.16)]. The process Z is ergodic and invariant under all rigid transformations of 
the plane. It was suggested \LCH90^ to use this process as the set of centers for a random Voronoi 
tesselation because the mutual repulsion of the points makes the resulting tesselation more realistic 
than the standard Poisson- Voronoi tesselation for many purposes. Some rigorous results along these 
lines were obtained in \GollC^ . 

The mean number of points in any region D is 1/tt times the area \D\, so the restriction of Zo 
to such a region of finite area is a determinantal process with finite mean number of points. Fix a 



{Z\N = k + l)>{Z\N = k) . 
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finite region, D, and let f count the number of "lonely" points in D, these being such that no other 
point of Z in D is within distance 1. We claim that f is Lipschitz with constant equal to 6. Clearly 
if a point z is added to the configuration rj then f can increase by at most 1. It is well known that 
the maximum number of points in a unit disk that can be at mutual distance of at least 1 from one 
another is 6, which implies that the addition of z can result in the loss of at most 6 lonely points. 
Applying Theorem \3.(i\ to f /6 yields the concentration inequality 

P(|/ - E/l > a) < 5 exp ( " , , , , \ . 

Example 5.13 (Zeros of random polynomials). Let {X„} be IID standard complex Gaussian random 
variables and define the random power series 

oo 

h{z) ■ 

It is easy to see that h is almost surely analytic on the open unit disk and the number of zeros on 
any disk of radius p < I has finite mean. The remarkable properties of the point process Z on the 
unit disk that is the zero set of h are detailed in \PV05l It is a determinantal process whose kernel 
is the Bergman kernel 7r^^(l — zw)^"^ . It is invariant under Mobius transformations of the unit disk 
and has intensity measure Tr^^/{1 — Endowing the unit disk with the hyperbolic metric, the 

Mobius transformations become isometrics, whence Z is hyperbolic isometry invariant. 

Fix p < 1 and r > and let f count the number of zeros of the restriction Zp of Z to the disk of 
radius 1 — p that are "hyperbolically lonely", meaning that no other point of Zp is within a hyperbolic 
distance r. Let Cr denote the maximum number of points at mutual hyperbolic distance r that may 
be be placed in a disk of hyperbolic radius r. Arguing as in Examvle \5. 12\ we see that f is Lipshitz 
with constant Cr. The mean number of points in Zp is p^ / {1 — p^) which for simplicity we can bound 
from above by 1/(1 — p^). An application of Theorem \S.6\ to f/cr now yields 

F(|/-E/|>.)<5cxp(- ^^^^^^|^^_^,^_, ). 
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